Skip to content

public/schema: add BigDecimal common type#421

Merged
Jeffail merged 4 commits into
mainfrom
schema-bigdecimal-type
Apr 28, 2026
Merged

public/schema: add BigDecimal common type#421
Jeffail merged 4 commits into
mainfrom
schema-bigdecimal-type

Conversation

@Jeffail
Copy link
Copy Markdown
Collaborator

@Jeffail Jeffail commented Apr 28, 2026

Summary

  • Adds a BigDecimal common type for arbitrary-precision decimals, complementing the fixed-precision Decimal shipped in public/schema: add Decimal common type with precision and scale #420. Use it for sources whose column metadata does not declare precision and scale — unparameterised Postgres numeric, Oracle NUMBER with no DATA_PRECISION, MongoDB Decimal128.
  • New helpers: NewBigDecimal, FormatBigDecimal, and ParseBigDecimal. Unlike ParseDecimal, ParseBigDecimal recovers the scale from the input rather than taking it as a parameter.
  • Common.Validate now enforces a leaf-check across all non-container types (every CommonType other than Object, Map, Array, Union). This removes a pre-existing inconsistency where Validate enforced parameter rules but ignored structural ones.
  • ParseDecimal and ParseBigDecimal are now consistently lenient on non-canonical-but-unambiguous inputs (leading plus, leading zeros, missing integer part as in ".5") and strict on ambiguous or malformed inputs (scientific notation, multiple decimal points, whitespace, thousands separators, non-digit characters). Canonical form is asserted exclusively at the emit boundary by FormatDecimal / FormatBigDecimal — Postel applies.
  • public/schema/decimal_types.md extended with a "BigDecimal: arbitrary-precision decimals" section, per-format converter expectations (Avro/Parquet/Iceberg reject; JSON Schema permissive pattern), and a note on the parse/emit asymmetry.

Non-decimal schema fingerprints remain byte-stable; the new BigDecimal type is encoded by its type identifier and never carries logical params.

Test plan

  • task fmt
  • task lint (0 issues)
  • task test (full repo suite passes)
  • New unit coverage for BigDecimal ToAny/ParseFromAny round-trip, Validate rejection of children + Logical.Decimal, FormatBigDecimal / ParseBigDecimal happy and error paths, parse → format normalisation of non-canonical inputs, and the broader leaf-check across every leaf CommonType

BigDecimal carries an arbitrary-precision decimal value with no schema-level
precision or scale, complementing the fixed-precision Decimal type. Use it
for sources whose column metadata does not declare precision and scale —
unparameterised Postgres NUMERIC, Oracle NUMBER without DATA_PRECISION,
MongoDB Decimal128.

Adds:

- BigDecimal CommonType = 16 with its String/typeFromStr cases.
- NewBigDecimal, FormatBigDecimal, ParseBigDecimal helpers. ParseBigDecimal
  recovers the scale from the input rather than taking it as a parameter.
- A shared parseCanonicalDecimal helper between ParseDecimal and
  ParseBigDecimal so the accepted form stays consistent.
- A leaf-check in Common.Validate: every type other than Object, Map,
  Array, and Union must have empty Children. Removes a pre-existing
  inconsistency where Validate enforced parameter rules but ignored
  structural ones.

Decimal parsers (ParseDecimal, ParseBigDecimal) are now lenient on
non-canonical-but-unambiguous inputs (leading plus, leading zeros, missing
integer part as in ".5") and strict on ambiguous or malformed inputs
(scientific notation, multiple decimal points, whitespace, thousands
separators, non-digit characters). Canonical form is asserted on the way
out by FormatDecimal/FormatBigDecimal — Postel applies.

Documents the BigDecimal type, the relaxed value contract, and the
parse/emit asymmetry in public/schema/decimal_types.md.
Comment thread public/schema/bigdecimal.go Outdated
@@ -0,0 +1,64 @@
// Copyright 2025 Redpanda Data, Inc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright 2025 Redpanda Data, Inc.
// Copyright 2026 Redpanda Data, Inc.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 819a609 — thank you, @josephwoodward.

return nil, 0, fmt.Errorf("failed to parse decimal value %q", s)
}

return n, int32(len(fracPart)), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given len returns a 64-bit int, do we want to check it's within math.MaxInt32 and handle if not instead of it silently wrapping?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, @josephwoodward — addressed in 79ae0ab. The same wrap-around lurked on the ParseDecimal side too (its int32(len(fracPart)) > scale comparison would have short-circuited rather than caught the overflow), so the bound now lives in the shared parseCanonicalDecimal helper and both parsers benefit from the explicit error.

Comment thread public/schema/bigdecimal_test.go Outdated
len(fracPart) is a 64-bit int. The downstream casts to int32 (the scale
type) in ParseDecimal and ParseBigDecimal would wrap silently on a
fractional part longer than math.MaxInt32 — the BigDecimal path would
return a negative scale and the Decimal path would short-circuit its
"exceeds scale" check.

Bound the fractional length in parseCanonicalDecimal with an explicit
error so both parsers fail loudly rather than returning corrupt output.
@Jeffail Jeffail merged commit ef3a242 into main Apr 28, 2026
3 of 4 checks passed
@Jeffail Jeffail deleted the schema-bigdecimal-type branch April 28, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants